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Abstract 

Recently, it has been observed that a sparse trigonometric polynomial, i.e. having only 
a small number of non-zero coefficients, can be reconstructed exactly from a small number 
of random samples using Basis Pursuit (BP) or Orthogonal Matching Pursuit (OMP). In 
the present article it is shown that recovery by a BP variant is stable under perturbation 
of the samples values by noise. A similar partial result for OMP is provided. For BP in 
addition, the stability result is extended to (non-sparse) trigonometric polynomials that can 
be well-approximated by sparse ones. The theoretical findings are illustrated by numerical 
experiments. 
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1 Introduction 

Over the recent years compressed sensing has become a rapidly developing research field, see 
e.g. [1, 4, 8, 10, 30, 34]. In their seminal papers [4, 5, 6] Candes, Romberg and Tao observed 
that it is possible to recover sparse vectors, i.e., having only few non- vanishing coefficients, from 
a number of measurements that is small compared to the ambient dimension of the vector. As 
reconstruction method they promoted £i-minimization, also refered to as Basis Pursuit (BP) 
[7] . Their results apply in particular to recovery of a sparse vector from (random) samples of 
its discrete Fourier transform. In [28] the author extended their result to the situation where 
samples of the corresponding trigonometric polynomial are taken at random from the uniform 
(continuous) distribution on the cube, i.e., the samples are chosen "off the grid". 

Another line of research suggests Orthogonal Matching Pursuit (OMP) as recovery method 
[15, 21, 33]. This is a greedy algorithm which is significantly faster than BP in practice. Partial 
results in [21] indicate that also OMP is able to recover a sparse trigonometric polynomial from 
few random samples. Moreover, numerical experiments suggest that OMP usually has a slightly 
higher probability of recovery success than BP - although BP has some theoretical advantages. 

In practice, it is important that recovery methods are stable in the presence of noise on 
the measurements. Candes et al. showed in [5] that (a variant of) BP is indeed stable under 
a certain condition on the measurement matrix involving the so called restricted isometry 
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constants. An estimation of these constants for the measurement matrix corresponding to 
random samples of the discrete Fourier transform was provided in [6] and [30] . In the present 
article we extend this estimate to the case of random samples at uniformly distributed points 
on the cube [0,27r]'^. 

We further provide partial results indicating that also OMP is stable under perturbation 
of the measurements by noise. Finally, numerical experiments reveal that the average recon- 
struction error of OMP is usually smaller than for (the variant of) BP in the presence of 
noise. 

After the first submission of this manuscript, variants of OMP - Regularized Orthogonal 
Matching Pursuit (ROMP) [26, 25] and CoSaMP [24] - were introduced, that achieve similar 
theoretical recovery and stability guarantees as Basis Pursuit and are even slightly faster than 
OMP. Since the analysis of these algorithms is based on the restricted isometry constants our 
estimates for the Fourier type measurement matrix are useful for the analysis of ROMP and 
CoSaMP as well. 

The paper is organized as follows. Section 2 gives some background on prior work, intro- 
duces notation and describes our problem. In Section 3 we present our main results concerning 
stability of a variant of BP, while Section 4 states stability theorems for OMP. Section 5 presents 
the proofs for BP, and Section 6 deals with the ones for OMP. The numerical experiments are 
detailed in Section 7. Finally, we conclude in Section 8 with a discussion. 

2 Prior Work and Problem Statement 

For some finite subset F C Z*^, d G N, we let Ilr denote the space of all trigonometric poly- 
nomials in dimension d whose coefficients are supported on F. An element / of Hp is of the 
form f{x) = X^fcgr "^fc^*^^' ^ ^ [Oj^tt]'^, with Fourier coefficients G C. The dimension of Hp 
will be denoted by D := |r|. One may imagine F = {—q, —q + l,...,q—l, q}"^, but actually 
arbitrary sets F are possible. 

We will mainly deal with "sparse" trigonometric polynomials, i.e., we assume that the 
sequence of coefficients is supported only on a small set T C F. However, a priori nothing 
is known about T apart from a maximum size. Thus, it is useful to introduce the (nonlinear) 
set nr(M) C Br of all trigonometric polynomials whose Fourier coefficients are supported on 
a set T C F satisfying |r| < M, Ur{M) = UTcr,|T|<Af 

Our aim is to reconstruct an element / G Ilr{M) from sample values /(xi), . . . , /(xat), 
where the number N of sampling points xi, . . . ,xn & [0, 2tt]'^' is small compared to the dimen- 
sion D (but, of course, larger than the sparsity M). As suggested by [4, 6, 15, 21, 28] we 
will study the behaviour of two reconstruction methods: Basis Pursuit (BP) and Orthogonal 
Matching Pursuit (OMP). 

BP was much promoted by Donoho and his coworkers, see e.g. [7, 13]. It consists in solving 
the following ^^-minimization problem 

min||(4)||i=^|4| subject to J] = / (x,) , j = l,...,N. (2.1) 

keT feer 

This task can be performed with convex optimization techniques [2] . Recently, much effort has 
been dedicated to the development of fast algorithms specialized to ^i-minimization, see e.g. 
[9, 14, 18]. 
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Algorithm 1 OMP 

Input: sampling set X C [0, 2'rr]'^, sampling vector f := {f{xj))jLi, set F C Z''. 
Optional: maximum allowed sparsity M and/or residual tolerance e. 

1: Set s = 0, the residual vector vq = f , and the index set Tq = 0. 

2: repeat 

3: Set s = s + 1. 

4: Find kg = argmax^gr \{rs-i, 0fc)| and augment Tg = Tg-i U {ks}. 
5: Project onto spa,n{(f)k, k G Tg} by solving the least squares problem 

\\y^T,xds - f||2 ^ min. 

6: Compute the new residual = f — J^r^xds- 
7: until s = M or ||rs|| < e 

8: Set T = Tg, the non-zeros of the vector c are given by {ckjkeT = dg. 
Output: vector of coefficients {ck)k&v and its support T. 



OMP is a greedy algorithm [23, 33], which selects a new element of the support set T in each 
step, see Algorithm 1. Its precise formulation uses the following notation. Let X = (xi, . . . , xn) 
be the sequence of sampling points. We denote by Tx the N x D matrix with entries 

{J'xhk = e'^-^^ , l<j<N,keT. (2.2) 

Then clearly, f{xj) = {Txc)j if c is the vector of Fourier coefficients of /. Let (j)k denote the 
fe-th column of Fx, i-e., (f^k = (e**^'^'^)^^. The restriction of J^x to the columns indexed by T 
is denoted by Ttx- Furthermore, let (•, •) denote the usual Euclidean scalar product and || • ||2 
the associated norm. We have W^kW^ = VN for all k eV, i.e., all the columns of J^x have the 
same £^-norm. For details on the implementation of OMP we refer to [21]. We only note that 
the fast Fourier transform (FFT) or the non-equispaced fast Fourier transform (NFFT), see 
e.g. [27] and the references therein, can be used for speed-ups of OMP. 

Since it seems to be very hard to come up with deterministic recovery results we model the 
sampling points xi,... ,xn as random variables. To this end we use two probability models. 

(1) The sampling points Xi,. . . ,xn are independent random variables having the uniform 
distribution on the cube [0, 27r]'^. 

(2) The sampling points xi,...,xn are independent random variables having the uniform 
distribution on the grid {0, • • • > ^tt^^}*^. Here, it is implicitly assumed that F C 
{0,l,...,g-l}f 

Model (1) will also be refered to as the continuous model, while the second will be called 
"discrete" . Observe that with model (2) it might happen with non-zero probability that some 
sampling points are selected more than once. To overcome this problem one might also choose 
the sampling set uniformly at random among all subsets of the grid {0, . . . , of size 

N. This model was actually used in [4, 6, 30]. However, for technical reasons we work with 
the model (2) here. Intuitively, moving from model (2) to its variant should actually improve 
the situation since always a maximum of information is used. 
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In [28] it was proven that BP is able to recover a sparse trigonometric polynomial from a 
rather small number of sample values. 

Theorem 2.1. Let T gF with \T\ < M. Choose xi, . . . ,xn be random variables according to 
the probability models (1) or (2). Assume that 

N >CMlog{D/e). (2.3) 

Then with probability at least 1 — e both BP and OMP recover exactly all f G Ilr{M) with 
coefficients supported on T from the sample values f(xj),j = 1,...,N. The constant C is 
absolute. 

The above theorem is non-uniform in the sense that for a single sampling set X recovery is 
guaranteed only for the given support set T (but for all Fourier coefficients supported on T). 
By Theorem 3.2 to be shown later it follows that this drawback can be removed, i.e., recovery 
by BP can be made fully uniform by introducing additional log factors to condition (2.3). 

Recovery by OMP was studied theoretically and numerically in [21], although the theoret- 
ical results are only partial so far. At least the first step of OMP could be analyzed: 

Theorem 2.2. Let f G U.r{M) with coefficients supported on T. Choose random sampling 
points xi, . . . ,xn according to one of our two probability models. If 

N > CM\og{D/e) 

then with probability at least 1 — e OMP selects an element of the true support T in the first 
iteration. 

The numerical experiments conducted in [21] suggest that also the further steps of OMP 
select elements of the true support T, so that after M steps the correct polynomial / is 
recovered. However, starting with the second step the theoretical analysis seems to be quite 
difficult due to subtle stochastic dependency issues. 

We note that the above theorem is non-uniform in the sense that the success probability 
is valid for the given polynomial, but it does not state that with high probability a single 
sampling set {xi, . . . ,xn} is good for all sparse trigonometric polynomials. Such a uniform 
result was also provided in [21], which actually analyzes the full application of OMP, but 
requires significantly more samples. 

Theorem 2.3. Let X = (xi, . . . ,xn) be chosen according to the continuous probability model 
(1) or the discrete model (2). Suppose that 

N > C(2M - if log(4D7e), (2.4) 

where D' := ^{j — k : j, k & F , j k} < D^. Then with probability at least l — e OMP recovers 
every f G FIt{M). The constant satisfies C < 4 + 4.94. In case of the continuous 

probability model it can be improved to C = A/S. 

The above result is based on analysis of the coherence, see also below. It seems that 
condition (2.4) is actually optimal up to perhaps the constant C and the log-factor if one 
requires uniformity, i.e., recovery of all sparse trigonometric polynomials in nr(M) from a 
single sampling set X, see [29]. In this regard, BP and OMP seem to be crucially different. 
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(a) Trig, polynomial and (noisy) samples. (b) True and recovered coefficients. 

Figure 1: Left: Trigonometric polynomial (real part) of sparsity M = 8 and = 40 samples 
(o). The samples are disturbed by noise rj with ||?7||2 = 4 (x). Right: True coefficients (o), 
reconstruction by BP variant (3.1) (*), reconstruction by OMP (x). 



BP can give a uniform guarantee if the number of samples N scales linearly in the sparsity M 
(ignoring log- factors) , see Theorem 2.1, Theorem 3.2 below and e.g. [6, 30], while OMP can 
give at most a non-uniform guarantee in this range, compare also [11, Section 7] and [15]. 

In this article we treat the question whether recovery by BP and OMP is stable if the 
sample values f{xj) are perturbed by noise. Additionally, for BP we consider also the case 
that / is not sparse in a strict sense, but can be well approximated by a sparse trigonometric 
polynomial. 

In mathematical terms we assume that we observe the vector 

y = {f{xj))jLi +ri = Txc + 77, 

rather than {f{xj)) = J^xc, where the noise r/ satisfies ||r7||2 = {^iLiVj)^^'^ ^ ^ fo^ some 
(T > 0. We will investigate whether the difference between the original coefficient vector and 
the one reconstructed by OMP or BP is small. For OMP we additionally ask whether the 
correct support set is recovered. Figure 1 provides a first illustration by showing an example 
of a reconstruction by the BP variant (3.1) and OMP from noisy samples. 

In the sequel, || • \\p^q will denote the operator norm from the sequence space into (on 
some index set), [xj is the largest integer smaller or equal to x. Furthermore, C will always 
denote a generic constant, whose value might be different in each occurence. 



3 Basis Pursuit 

In the presence of noise it is useful to consider a slight variant of Basis Pursuit. Indeed, in 

[5] it is suggested to minimize the £i-norm of the coefficient vector c subject to the constraint 
that the residual error satisfies \\J^xc — y\\2 < <7, i.e., we solve 

min||c||i subject to ||.?^jfC — y\\2 < cr- (3-1) 
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Again this problem can be solved by convex optimization techniques [2] . Clearly, if cr = then 
we are back to the original Basis Pursuit principle (2.1). 

For the problem (3.1) quite general stability results were obtained by Candes, Romberg 
and Tao in [5], sec also [8]. Their key concept is the following definition. 

Definition 3.1. The restricted isometry constant 6m of a matrix A is the smallest number 
such that for all subsets T with \T\ < M it holds 

{1-6m)\\x\\1 < \\Atx\\1 < {1 + 6m)\\x\\1 (3.2) 

for all coefficients x supported on T. Here At denotes the restriction of A to the columns 

indexed by T. 

In [5] the following theorem was proved. (Although it was originally stated only for the 
real-valued case the theorem together with its proof also holds for the complex-valued case.) 

Theorem 3.1. Assume that A is some matrix for which the restricted isometry constants 
satisfy 

SsM + 3(54M < 2. 

Let x G and assume we have given noisy data y = Ax + rj with \\rj\\2 < cr. Denote by xm 
the truncated vector corresponding to the M largest absolute values of x. Then the solution 
to the problem 

min ||x||i subject to \\Ax — y\\2 < cr 

satisfies 

\\x* - xh < Cia + c jl"^-f^ll\ (3.3) 

V M 

The constants C\ and C2 depend only on 63M and 64M ■ 

Thus, recovery by the BP variant (3.1) is stable provided the restricted isometry constants 
are small. Note that the second term in (3.3) vanishes if x is sparse, i.e., has not more than 
M non-vanishing coefficients. 

For our case this means that it is sufficient to provide conditions that ensure S^m < S 
for some small 6 with high probability. (Note that for 5 = 1/5 the constants in the previous 
theorem are actually quite well-behaved, Ci < 12.04 and C2 < 8.77, see [5].) 

Candes and Tao [6] provided such conditions for the discrete Fourier transform with a 
slightly different probability model than our discrete model (2). More recently, Rudelson and 
Vershynin came up with a more elegant and shorter solution to this problem [30]. It is possible 
to apply their technique also to our two probability models, notably the continuous one. This 
gives the following result. 

Theorem 3.2. Let D = \T\ and a sparsity M be given. Let e, 5 G (0, 1) and assume 

^ > C6-^M log^{M)log{D)log{e~^). (3.4) 



log{N) 



Let the N sampling points X = (xi, . . . , xn) be chosen at random according to the model (1) or 
(2). Then with probability at least 1 — e the isometry constant of the matrix N~^/^J^x satisfies 

6m < 6. (3.5) 

The constant C is absolute. 
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The combination of Theorems 3.1 and 3.2 gives the following. 



Corollary 3.3. Let F with |r| = D, M, N and e such that 

N 



log(iV) 



> CoMlog2(M)log(£»)log(e-i). (3.6) 



Choose xi, . . . ,xj\f according to the probability model (1) or (2). Then with probability at least 
1 — e the following holds for all coefficient vectors c G C^. Assume y = J^xc + r] with \\r]\\2 < o". 
Denote by cm the truncated vector corresponding to the largest coefficients of c. Then the 
solution c"^ to the minimization problem (3.1) satisfies 

\\c^ - c\\2 < Ci—^ + C2- , — (3.7) 

Remark 3.1. (a) Choosing a = yields uniform exact recovery. Under condition (3.4) BP is 
able to reconstruct exactly all f G Ilr{M) from a single sampling set X. 

(b) Note that condition (3.4) is satisfied if AT > C(5-^Mlog'^(i:») log(e"i). Furthermore, 
(3.6) is probably not optimal. One may conjecture that N = 0{M\og{D/e)) or even 
N = 0{M\og{D/{Me))) samples are enough, see also [30]. 

(c) With a discrete probability model (the variant of (2) outlined in Secion 2), Candes and 

Tao originally obtained a version of Theorem 3.2 (see [6, Lemma 4.3]) where for some 
parameter a and constant p the statement 6m < cq holds with probability at least 
1 - CD-pI'^ under the condition N > a-^M\og{Df. Substituting e = CD-p/'^ and 
solving for a yields the condition 

N > C'M\og{Df\og{e-^). (3.8) 

It might be possible to adapt the original proof of Candes and Tao also to the continuous 
probability model (1) although this does not seem straightforward. 



4 Orthogonal Matching Pursuit 

In this section we consider the stability of OMP. Since we measure only noisy samples we 
cannot expect to have perfect recovery of a sparse signal, but at least we would like to obtain 
the true support of the sparse coefficient vector and only small deviations of their entries. We 
first provide the analogue of Theorem 2.2 for the noisy case. Unfortunately, we again have to 
restrict to the first iteration because it is still not clear how to deal with the subtle stochastic 
dependency issues arising in the analysis of the further iterations. 

Theorem 4.1. Let f G Ilr{M) with Fourier coefficients c. Let N e N and r, e G (0, 1) such 
that 

N >CMT-^\og{D/e). (4.1) 

Further, let a > such that 

1 _ ^ rw 

(4.2) 
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Choose the random sampling set X = (xi,...,xjv) according to the probability model (1) or 
(2). Assume that we have given noisy samples y = (/(x^))^-^ + rj = Txc + with ||r/||2 < o . 
Then with probability exceeding 1 — e OMP selects an element of the true support of c in the 
first step. 

If after M steps OMP actually recovers the complete support of c then with probability exceeding 
1 — e the reconstructed coefficients c satisfy 

||c-c||2<y^a. (4.3) 

Prom the proof of this Theorem one can deduce more precise information about the constant 
in condition (4.1). Indeed, N has to satisfy the two conditions 



N > 17.88 Mr"^ log(8L>/e) and 



> ln(2(l - l/(4e))-^M/e). 



_12eM_ 

Note that ||c||2 > ^/ M vahij^T \cj\- Hence, condition (4.2) is satisfied if 

1 — T 

a < s/N min IcJ. (4.4) 

- 4 j(zT ' 

One expects that this condition (with possibly a different constant) is sufficient that OMP 
selects an element of the true support T in every step. Hence, the noise level should not exceed 
the minimal absolute non-zero coefficient in order to have recovery of the correct support. 

We note that our numerical experiments in Section 7 indicate that under condition (4.1) 
OMP actually selects elements of the true support T also in the further iterations and then 
(4.3) holds. However, we have not yet been able to carry through the corresponding theoretical 
analysis. 

4.1 A uniform result 

The result in the previous section is non-uniform. Let us state also a uniform recovery result 
for OMP extending Theorem 2.3 to the noisy situation. 

Theorem 4.2. Let the random sampling set X = [xi, . . . ,xi\f) be chosen according to one of 
our probability models. Let r, e G (0, 1) and a > 0. Assume that 

N > Cr2(2M - 1)2 ln(4D7e), (4.5) 

where D' = #{j — k : j, k G T , j k} < D^. Then with probability 1 — e the following holds 
for all f G nr(M) whose Fourier coefficients satisfy 

mm |cfe| > =. (4-6) 

fcesuppc {1 — t)VN 

If OMP is applied on the noisy samples y = Tx^ + r\ with \\ri\\2 < cr, and stopped once 
the residual satisfies \\rs\\ < cr then the true support of c is recovered and the reconstructed 
coefficient vector c satisfies 

1 

C — C 2 < — , = (7. 

- ^N{1 - r/2) 
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The above result has the drawback that the number of samples required by (4.5) scales 
quadratically in the sparsity M rather than linearly as in (4.1). As in the noiseless case 
however, one cannot expect to come around the quadratic scaling if one requires uniformity, 
i.e., recovery by OMP of all f G Ilr{M) from a single sampling set X. Up to perhaps the 
log-factor condition (4.5) seems then to be optimal, see [29]. 

In contrast, BP gives a uniform guarantee if the number of samples is only linear in the 
sparsity up to some log-factors, see Theorem 3.2. Thus, under this requirement, BP seems 
to be the method of choice. However, for certain applications it might be enough to have 
a non-uniform guarantee and then OMP is a good alternative considering that it is usually 
significantly faster and much easier to implement, see also Section 7. 



5 Proof of Theorem 3.2 

We mainly follow the ideas in [30]. Condition (3.2) for N~^/^!Fx is equivalent to 



sup ||It-A^ ^^tx^tx\\2^2 

\T\<M 



and we have to prove that this inequality holds for Sm < S with high probability. We denote 
by Zi G C'" the vector 

ze = (e-*^^)iker (5.1) 

and by zj its truncation to the index set T C F. For vectors y, z we define a rank one operator 
by (y ® z){x) = {x, y)z. We note that 



{zJ ®zj){c) = {c,zj)zl 



N T ^ T 



iter 



Observe that we can write J^^x-^TX = Ylit=i ® ^f- Thus, we have to show that 



sup 

\T\<M 



1 ^ 

iT--y 



T T 

Zi ® Zi 



< 6 



(5.2) 



2-+2 



with probability at least 1 — e. To this end we consider the expectation of the above expression. 
Further, we introduce an auxiliary matrix norm, 

1^1 = IIIAIII^ := sup Pt,t||2^2 

|r|<M 

where At,t denotes the submatrix of a matrix A consisting of the columns and rows indexed 
by T. The left hand side of (5.2) can be written as 



Xn '■= sup 

\T\<M 



N 



1 ^ 

\\i--y 



N 



Ze Ze) 



2^2 



e=i 



The random matrices ^{I — zeiSi ze), £ = I, . . . , N, are stochastically independent. Moreover, 
it is easy to see that for both probability models (1) and (2) E[ze ® ze] = I and I — ze'S> ze is 
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symmetric. Then by standard symmetrization techniques, see e.g. [22, Lemma 6.3], we have 



EXn = E sup 

\T\<M 



1 ^ 

iT--y 



E 



2->2 



1 ^ 



< 2E 



1 ^ 
I— ^e^z^®^;^! 



£=1 



= 2E sup 

|r|<M 



1 ^ 

-y 



ei zj (g) zj 



(5.3) 



2-+2 



where the are independent symmetric random variables taking values in {— 1,+1}, also 
jointly independent of the x^. Now the core of the proof is the following lemma due to Rudelson 
and Vershynin [30, Lemma 3.5]. 

Lemma 5.1. Let zi, . . . , zn, N < D, he (fixed) vectors in with uniformly hounded entries, 
lU^lloo ^ 1- Then 



E sup 

|r|<M 



N 



e=i 



2^2 



<K{M,N,D) sup 

\T\<M 



N 



1/2 



2^2 



where 



K{M,N,D) = Co\/Mlog(M)ykiCD)7loiM- 



We remark that the elegant proof of this lemma uses entropy methods, in particular, 
Dudley's inequality [22, Theorem 11.17] for the maximum of a Gaussian process. 

Now, as in [30], we denote E = E[Xn]. Using (5.3), taking the expectation only with 
respect to the variables ee, applying Lemma 5.1 and Holder's inequality we obtain 



2K(M, N, D) 
E< ^ ^ ' — ^E sup 



< 



N \T\<M 

2K{M, N, D 



N 



N 



Zf Zf 



1/2 



2^2 



N 



E sup 
|r|<M 



1 ^ 

iT--y 



zJ ® zJ 



1/2 



+ 1 



2-»2 



2K(M,N.D) 



It follows that E < 9 provided 



2K{M,N,D) ^ e 



N 



(5.4) 



To finish the proof we need to show that the random variable on the left hand side of (5.2) 
docs not deviate much from its expectation. Inspired by [3] we proceed differently as in [31] 
and use the following version of Talagrand's concentration inequality [32] proved by Klein and 
Rio in [19]. 

Theorem 5.2. Let Yi, . . . , Y^r be a sequence of independent random variables with values in 
some Polish space X. Let he a countable collection of real-valued measurable and bounded 
functions f on X with ||/||oo < B for all f E J^. Let Z he the random variable 

N 

Z = snpyf{Ye). 
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Assume Mf{Yi) = for all £ = 1, . . . , N and all / G JT. Let := sup^g^ Yle=i ^fC^ef- Then 
fort>0 

F{Z > + < exp ( - ^ log (1 + 2 log (1 + ^^^f^ ^ J ) ) . 
In order to apply the Theorem, we observe that 

1 ^ 

Xn = sup ||/r - T7 ® 

\T\<M N ^ 

1 ^ 

= sup sup sup | — y^((J-r — zj ® zj)v,w)\ 

|T^I<M„eC'^,ll„ll2<i ^eC^,||^||2<i ^^=1 

AT 



where 

= {{v,w) e : ||i;||2, \\w\\2 < l;suppi; = suppii; = T for some T with |r| < M}. 
Defining 

fv,w{^) = - z)v,w). 

we obtain 

TV 

Xn = sup |X/^,^(2£)|. 

(i),«))65m 

Clearly, E/^^^(z£) = iV~-^(E(7 — Z£ ^ Z£)v,w) = 0. Furthermore, for {v,w) G S'm and z 
(e*'^'^)fcer we have 



AT 

j,keT,jf^k jMTjf^k 
^ ^"'EE I^^IK-(fe)l = A^"'E<I^I'I^^"'^I) ^ ^^"'^Il^ll2|kl|2 < ^, (5.5) 

where {{k,aj{k)),j,k G T} is a reparametrization of T x T such that crj{T) = T, i.e., aj is 
a suitable permutation; and w^'^^^ denotes the corresponding vector of reordered entries of w. 
Above we used the Cauchy Schwarz inequality in the fifth step. We deduced ||/i;,it,||oo < M/N 
for all {v,w) G Sm- 

Next, for {v,w) G Sm, we compute 

3,k&T,j^k j',k'e.T,j'^k' 
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Since x is uniformly distributed on [0, 2tt]^ or on we have E[e^^ ^ = Syj-k+k' 

and, hence, 

mv,wi^e)\'^ = ^"'^ XI X VjVj_k+k'Wk'Wk < N~'^\\v\\l ^ Wk'W]^ 

j,keT,j^k k'eT k,k'eT 

<N-'^\\v\\l\T\ < M/AT^. 

In the second step we apphed the Cauchy Schwarz inequahty and in the third step a similar 
estimate as in (5.5). Hence, 

AT 

(7^= sup J2^\f^v,w){ze)\'' <M/N. 

{v,w)eSM 

Theorem 5.2 applies to real-valued functions /. Hence, we split into real and imaginary parts 
/^^ = Re(/t,^tt,), flyj = Im(/t,^u,). Then the estimates above apply also to these functions, i.e., 

II f*" II II II < M and n-2 n-2 < M 

Denote Z"" = sup(„^^)g5^ YJi=i fv,wi^i) and similarly define Z\ Since fv,-w = -fv,w we 
have Z'' = sup(„ „,)g5^ | Y^eLi fv,w{ze)\. By the union bound 

HXn >s) = Pd^n^ + \Z'f > S^) < P(Z^ > ^) + ^{z' > 4=). 

y/2 V 2 

Now assume EXn < S/2, which by (5.4) will be satisfied provided 2K{M,N,D)/VN < 



5/2 



in particular, if 



r 

2K{M, N, D)/\/N < 



2^/3/2 V^' 

Setting t = - I = ^^^^^5 —. c5 in Theorem 5.2 we obtain 

KXn >5)< 2exp (-^^ log(l + 21og(l + ^))) = 2e-^°(^)^/^, 

where cq{5) = c5 log(l + 2 log(l + j^))- In other words, < 5 with probability at least 1 — e 
provided N > co(5)"^Mlog(2/e) and 2K{M,N,D)/Vn < 6/^/Q. Note that 026'^ > co{S)-^ 
for all S G (0,1) where C2 = co(l)~^ = ^|3^1og(l + 21og(l + ^^))-^ «i 26.84. With the 
definition of K{M, N, D) we deduce that 5m < ^ with probability at least 1 — e provided 

^ > Ci(5-2Mlog2(M)log(L») and > CacJ-^M log(2e-^). 



log(A^) 

Both conditions are satisfied once 

N 



> Cr2Mlog2(M)log(L>)log(e-i) 



log(A^) 

for some suitable constant C. This finishes the proof of Theorem 3.2. 
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6 Proofs for Orthogonal Matching Pursuit 

6.1 Proof of Theorem 4.1 

The proof is an extension of the one in [21]. We will use the following result from [17] on the 
eigenvalues of a submatrix TtXi which is based on the analysis in [28, Lemma 3.3 and Section 
3.3]. 

Theorem 6.1. Let T of size \T\ = M and let xi, . . . ,xn be i.i.d. random variables that are 
uniformly distributed over [0, 27r]'^ or over the grid ^'^m- Choose e,S E (0, 1) and assume 



6^N 
3eM 



> ln{c{S)M/e), (6.1) 



where c{6) = (1 — ^ < (1 — e ^) ^ « 1.582. Then with probability at least 1 — e the 

minimal and maximal eigenvalue of T^-^Ttx satisfy 

l-S<Xrain{N-^J'TX^Tx), and Xraa.{N-^J'^x^Tx)<l+S. (6.2) 

Further, we need the following concentration inequality proved in [21]. 

Lemma 6.2. Assume that c is a vector supported on T. Further, assume that the sampling 
set X is chosen according to one of our two probability models. Then for j ^ T and t > it 
holds 



{\N-\j^TXC,cf)j)\ >t)< 4exp I -N— 

"12 -r 



t2 



4||c||^ + :;4^||c||ii 



Now we can turn to the proof of Theorem 4.1. (Orthogonal) Matching Pursuit selects an 
element of the support suppc =: T in the first iteration if 

ma,x\N~^{(t>j,J^TXC + ri)\ < msix\N~^ {(/))., J^txc + ri)\. (6.3) 



By the triangle inequality and Cauchy-Schwarz (note also that \\(l>k\\2 = vN) this will be 
satisfied if 

2 

max\N'^ J='txc)\ < \\N''^J=':}^x^txc\\oo T^lhlb- 

Assume for the moment that Xinin{N~^ J^^x-^Tx) > 1 — 5 for some S G (0, 1). (The probability 
that this happens can be estimated by Theorem 6.1.) This yields 

\\N-^T^x^Txc\\oo > M-^/^N-^J'^x^Txch > M-V2(i _ ^)||c||2. 

Thus, (6.3) is satisfied if 

-j^ ^ 2 

max\N-'^{(f)j,J^TXc)\ < ^^l|c||2 - -T^Wvh- (6.4) 

Assuming further that 

„ „ (1-5){1-t) |iV„ „ 



13 



condition (6.4) becomes true if 



max|iV ^{(f)j,J^TXc)\<^-p=iT\\c\\2. 



By the concentration inequality in Lemma 6.2 the probability that the above inequality does 
not hold can be estimated by 

max|iV-i(0,-,J^TXc)| < y^^lklbj < J^P (^\N-^{(t>j,J'TXc)\ < -^rWchj 



< 4D exp 



N (1-5)2t2||cI|2 



2 



M4||c||2 + ^||c||iM-V2(i_5)r||c||2^ 

2^2 \ 



N (1 - Syr 



< 4r>exp -— ^ . ; . (6.6) 



In the last line we used the Cauchy-Schwarz inequality, ||c||i < vM||c||2. Now we choose 
S = 1/2. Then condition (6.5) becomes (4.2) and 

1-6 „ „\ .„ f N r2 \ 



P ( max|iV-^((^,-,.?^TXc)| < ^^7=Jr||c||2 ) <4£)exp ( - 
The latter term is less than e/2 if 

AT > CMr-^ log (8D/e) 

with C = 16 + ~ 17.88. Furthermore, by Theorem 6.1 our initial assumption that 
^inm{N~^J^TX * ^Tx) > 1 — (5 = 1/2 fails with probability at most e/2 if 



N 



UeM 



> ln(2(l - e-^/4)~^M/e). 



Altogether, the probability that OMP does not select an element of T in the first step is less 
than e if 

N >CMT-^\og{D/e) 

for some suitable constant C. 

Now consider the final statement of the Theorem, i.e., assume that OMP has reconstructed 
the true support T after Af steps. Then the reconstructed coefficients arc given hy c = 
^Txi'^TXC + 7]) where J-^^x denotes the pseudo-inverse of Ttx- Observe that T^^x-^TXC = c. 
Hence 

||C-C||2 = ||:4x^ll2 < ll|-?Txlllll^ll2 = ^ ><min{J^TX^Tx)~'^\\vh < ^jV(l - S) ^ ^ ^' 

where we used Theorem 6.1 once more. 
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6.2 Proof of Theorem 4.2 



The proof of the uniform recovery result is based on the coherence parameter, which measures 
the maximum correlation between distinct columns of a matrix A = \ . . . \i/jd), i-e., 

Based on fi the following theorem due to Donoho, Elad and Temlyakov [12, Theorem 4.1] 
analyzes the performance of OMP in the presence of noise. 

Theorem 6.3. Assume that A has coherence //. Suppose that y = Ac + rj with only M 
coefficients of c being nonzero and \\r]\\2 < cr. Suppose (2M — l)/i < 1 and 

l-(2M-l)n . , , 
a < mm \ck\. 

2 fegsuppc 

If we run OMP until the residual satisfies \\rs\\ < cr then the true support of c has been re- 
covered, and consequently OMP has done M iterations. Furthermore, the error between the 
reconstructed coefficients c and the original coefficients satisfies 

1 

c — c 2 < — . a. 
- ^1 - (M - 

In [21] the following estimate of the coherence of !Fx was proven. 

Lemma 6.4. Let the random sampling set X = {x\, . . . ,xn) be chosen according to one of 
our probability models and let /i be the coherence of the random matrix N~'^l'^!Fx- Then 

P(// >t)< AD' exp I -N I , 

where D' = #{j - k : j,k e r,j k} < D^. 

Remark 6.1. In case of the continuous probability model the previous estimate can be slightly 
improved to [21] 

>t)<{l- K)-^L>'e-^'^*', K e (0, 1). 



Now the proof of Theorem 4.2 is a mere application of the above statements. Note that 
y = Txc + ri = N'^^^J^xiVN c) + rj. Thus, setting t = 2M-i lemma, solving for N and 

using Theorem 6.3 with d = s/Nc shows the assertion. 



7 Numerical Experiments 

To illustrate the theoretical results we also conducted numerical experiments. We choose a 
number of samples A'', the noise level a, the sparsity M and an (even) dimension D and set 
r = {—D/2 + 1, . . . ,D/2}. Then we repeat the following reconstruction experiment 100 times. 
We choose a subset T uniformly at random among all subsets of size M. Then we randomly 
select the real part and imaginary part of the coefficients Ck on T from a standard normal 
distribution. The sampling points xi,...,xjv are randomly drawn either from the uniform 
distribution on [0, 27r] (probability model (1), labelled NFFT in the plots) or uniformly among 
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(a) Recovery success rate of the true support. (b) Avarage error. 

Figure 2: Recovery of sparse trigonometric polynomials in dimension D = 256 from N = 50 
noisy samples, II77II2 = cr = 0.4. The sparsity M is varied. 

all subsets of {0, ^, ... , M^zli} of size N (a slight variation of the probability model (2) 
preventing that some of the sampling points coincide, labelled FFT). The perturbed sampling 
points are given by y£ = J2keT Ck&^^'^^ + ^ = 1) ■ ■ ■ ) where the noise vector 77 is chosen 
uniformly at random on the sphere with radius a in C-^, i.e. ||r7||2 = cr. 

Then we solve the £i-minimization problem (3.1) (with the chosen cr) and run OMP (with 
precisely M iterations), respectively, and compute the error between the reconstructed vector 
c and the original vector c for both methods. Also we test whether the correct support has 
been recovered. 

Figures 2 and 3 show the results for varying sparsity, while in Figure 4 the noise level a 
is varied. These plots indicate that the BP variant and OMP are both stable Tinder noise 
as predicted by the theoretical results. Figure 4 suggests that the correct support set can 
be recovered even when the noise level reaches the order of the ^2-energy of the samples of 
the signal. Moreover, OMP usually performs slightly better than BP. In fact, OMP yields a 
smaller avarage reconstruction error and also reconstructs more often the correct support - 
despite that fact that theoretically BP gives a uniform recovery guarantee while OMP does 
not. This might be due to the fact that OMP forces the reconstruction to be M-sparse while 
BP may result in larger support sets. Furthermore, OMP is much faster than BP (by a factor 
between 10 and 200 in the examples). For a more detailed comparison of the computation 
times we refer to [21]. 

The Matlab toolbox CVX [16] was used for solving (3.1). The examples (including the 
OMP algorithm) are part of the Matlab toolbox [20], which is available online. 

8 Discussion 

We presented theoretical and numerical results concerning the stability of recovery of sparse 
trigonometric polynomials with (a variant of) Basis Pursuit and Orthogonal Matching Pursuit. 
The (non- uniform) recovery Theorem 4.1 for OMP, however, is only partial so far. It remains 
open to analyze theoretically the further iterations after the first step. 

BP has the advantage of giving a uniform guarantee of recovery success, i.e., a single 
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(a) Average error, a = 5, D = 256, = 50 
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Figure 3: Recovery of sparse trigonometric polynomials for different sets of parameters. The 
sparsity is varied. 
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Figure 4: Recovery of sparse trigonometric polynomials for different sets of parameters. The 
noise level a is varied. (For comparison, the average ^2-norm of the vector of samples of the 
unperturbed polynomial is approximately 39.4). 
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sampling set X may be sufficient to recover all sparse trigonometric polynomials, while it 
seems that OMP is only able to provide non-uniform recovery results at reasonably small ratio 
of the number of samples to the sparsity [29]. (But note the results for variants of OMP in 
[26, 25, 24].) In practice, however, a non-uniform guarantee might be sufficient and indeed our 
numerical experiments show that OMP even slightly outperforms BP on generic (=random) 
signals. 

Corollary 3.3 concerning BP covers also the case that the coefficient vector c is not sparse 
in a strict sense. In this case it estimates the approximation error of the reconstruction by 
the approximation error with M-terms. In principle, we might also apply Theorems 4.1 and 
4.2 for OMP to the non-sparse case by letting rj = J-xCy\t; i-c, by treating the contribution 
of the (small) coefficients outside T as noise. However, for most situations conditions (4.4) 
and (4.6) on the magnitude of the coefficients become then unrealistic. Roughly speaking they 
would imply that the smallest coefficient of c in T is significantly larger than the ^2-norm of 
the coefficients outside T. So a thorough treatment of the non-sparse case for OMP is still 
open. 

OMP is usually faster (and easier to implement) than BP in practice, and the numerical 
results even indicate that OMP is slightly more stable. So in most practical situations one 
would probably prefer to use OMP despite its lack of giving a uniform recovery guarantee when 
the number of samples is only linear in the sparsity. 
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